Reliability and Fault-Tolerance by Choreographic Design
نویسندگان
چکیده
Distributed programs are hard to get right because they are required to be open, scalable, long-running, and tolerant to faults. In particular, the recent approaches to distributed software based on (micro)services where different services are developed independently by disparate teams exacerbate the problem. In fact, services are meant to be composed together and run in open context where unpredictable behaviours can emerge. This makes it necessary to adopt suitable strategies for monitoring the execution and incorporate recovery and adaptation mechanisms so to make distributed programs more flexible and robust. The typical approach that is currently adopted is to embed such mechanisms in the program logic, which makes it hard to extract, compare and debug. We propose an approach that employs formal abstractions for specifying failure recovery and adaptation strategies. Although implementation agnostic, these abstractions would be amenable to algorithmic synthesis of code, monitoring and tests. We consider message-passing programs (a la Erlang, Go, or MPI) that are gaining momentum both in academia and industry. Our research agenda consists of (1) the definition of formal behavioural models encompassing failures, (2) the specification of the relevant properties of adaptation and recovery strategy, (3) the automatic generation of monitoring, recovery, and adaptation logic in target languages of interest.
منابع مشابه
A Microprocessor-Based Hybrid Duplex Fault-Tolerant System
Reliability is one of the fundamental considerations in the design of industrial control equipment. The microprocessor-based Hybrid Duplex fault-tolerant System (HDS) proposed in this paper has high reliability to meet this demand although its hardware structure is simple. The hardware configuration of HDS and the fault tolerance of this system are described. The switching control strategies in...
متن کاملDesign, Testing, and Evaluation Techniques for Software Reliability Engineering
Software reliability is closely influenced by the creation, manifestation and impact of software faults. Consequently, software reliability can be improved by treating software faults properly, using techniques of fault tolerance, fault removal, and fault prediction. Fault tolerance techniques achieve the design for reliability, fault removal techniques achieve the testing for reliability, and ...
متن کاملSystem Reliability, Fault Tolerance and Design Metrics Tradeoffs in the Distributed Minority and Majority Voting Based Redundancy Scheme
The distributed minority and majority voting based redundancy (DMMR) scheme was recently proposed as an efficient alternative to the conventional N-modular redundancy (NMR) scheme for the physical design of mission/safety-critical circuits and systems. The DMMR scheme enables significant improvements in fault tolerance and design metrics compared to the NMR scheme albeit at the expense of a sli...
متن کاملToward an Object-Oriented Approach to Software Fault Tolerance
Software fault tolerance is often necessary, but can itself be dangerously error-prone because of the additional effort that must be involved in the programming process. The additional redundancy may increase size and complexity and thus adversely affect software reliability. Object-oriented programming provides an appropriate framework for controlling complexity and enforcing reliability. Howe...
متن کاملKeynote Speech: Design Testing and Evaluation Techniques for Software Reliability Engineering
Software reliability is closely influenced by the creation, manifestation and impact of software faults. Consequently, software reliability can be improved by treating software faults properly, using techniques of fault tolerance, fault removal, and fault prediction. Fault tolerance techniques achieve the design for reliability, fault removal techniques achieve the testing for reliability, and ...
متن کاملذخیره در منابع من
با ذخیره ی این منبع در منابع من، دسترسی به آن را برای استفاده های بعدی آسان تر کنید
عنوان ژورنال:
دوره شماره
صفحات -
تاریخ انتشار 2017